CORAL: aligning conserved core regions across domain families
نویسندگان
چکیده
MOTIVATION Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. RESULTS CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. AVAILABILITY CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Cloning of conserved regions of nontypeable Haemophilus influenzae hmw1 core binding domain
Colonization of nontypeable Haemophilus influenzae (NTHi) in nasopharynx causes respiratory tract disease. In 80% of clinical isolates, HMW proteins are the major adhesions and induce protective antibodies in the hosts. Therefore, it can be used as a vaccine candidate. The aim of this study is designing and cloning of the conserved regions of NTHi hmw1 core binding domain.In this study, the sta...
متن کاملAnalogous regulatory sites within the αC-β4 loop regions of ZAP-70 tyrosine kinase and AGC kinases
The precise positioning of the flexible C-helix in the catalytic core is a critical step in the activation of most protein kinases. Consequently, the αC-β4 loop, which anchors the C-helix to the catalytic core, is highly conserved and mediates key structural interactions that serve as a hinge for C-helix movement. While these hinge interactions are conserved across diverse eukaryotic protein ki...
متن کاملکاربری پروتیینهای جدید در ساخت واکسن استافیلوکوکوس اورئوس
Background: Staphylococcus aureus and Staphylococcus epidermidis are major human pathogens of increasing importance due to the spread of antibiotic resistance. Novel potential targets for therapeutic antibodies are products of staphylococcal genes expressed during human infection. Previously, the secreted and surface-exposed proteins among seroreactive antigens have been discovered. Furthermore...
متن کاملStepwise Evolution of Coral Biomineralization Revealed with Genome-Wide Proteomics and Transcriptomics
Despite the importance of stony corals in many research fields related to global issues, such as marine ecology, climate change, paleoclimatogy, and metazoan evolution, very little is known about the evolutionary origin of coral skeleton formation. In order to investigate the evolution of coral biomineralization, we have identified skeletal organic matrix proteins (SOMPs) in the skeletal proteo...
متن کاملThe number of genes encoding repeat domain-containing proteins positively correlates with genome size in amoebal giant viruses
Curiously, in viruses, the virion volume appears to be predominantly driven by genome length rather than the number of proteins it encodes or geometric constraints. With their large genome and giant particle size, amoebal viruses (AVs) are ideally suited to study the relationship between genome and virion size and explore the role of genome plasticity in their evolutionary success. Different ge...
متن کامل